A model of random mass-matching and its use for automated significance testing in mass spectrometric proteome analysis.

نویسندگان

  • Jan Eriksson
  • David Fenyö
چکیده

A rapid and accurate method for testing the significance of protein identities determined by mass spectrometric analysis of protein digests and genome database searching is presented. The method is based on direct computation using a statistical model of the random matching of measured and theoretical proteolytic peptide masses. Protein identification algorithms typically rank the proteins of a genome database according to a score based on the number of matches between the masses obtained by mass spectrometry analysis and the theoretical proteolytic peptide masses of a database protein. The random matching of experimental and theoretical masses can cause false results. A result is significant only if the score characterizing the result deviates significantly from the score expected from a false result. A distribution of the score (number of matches) for random (false) results is computed directly from our model of the random matching, which allows significance testing under any experimental and database search constraints. In order to mimic protein identification data quality in large-scale proteome projects, low-to-high quality proteolytic peptide mass data were generated in silico and subsequently submitted to a database search program designed to include significance testing based on direct computation. This simulation procedure demonstrates the usefulness of direct significance testing for automatically screening for samples that must be subjected to peptide sequence analysis by e.g. tandem mass spectrometry in order to determine the protein identity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Proteome analysis of Cryptosporidium parvum and C. hominis using two-dimentional electrophoresis, image analysis and tandem mass spectrometry

Until recently, Cryptosporidium was thought to be a single species genus. Molecular studies now showthat there are at least 10 valid species of this parasite. Among them, two morphologically identical species, C.hominis and C. parvum are the most pathogenic identified to date and share 97% of identical genomes.Post-genomic analyses is therefore necessary to explore further the...

متن کامل

A statistical basis for testing the significance of mass spectrometric protein identification results.

A method for testing the significance of mass spectrometric (MS) protein identification results is presented. MS proteolytic peptide mapping and genome database searching provide a rapid, sensitive, and potentially accurate means for identifying proteins. Database search algorithms detect the matching between proteolytic peptide masses from an MS peptide map and theoretical proteolytic peptide ...

متن کامل

Probity: a protein identification algorithm with accurate assignment of the statistical significance of the results.

An algorithm for protein identification based on mass spectrometric proteolytic peptide mapping and genome database searching is presented. The algorithm ranks database proteins based on direct calculation of the probability of random matching and assigns the statistical significance to each result. We investigate the performance of the algorithm by simulation and show that the algorithm respon...

متن کامل

Protein identification in complex mixtures.

This paper investigates the prospects of successful mass spectrometric protein identification based on mass data from proteolytic digests of complex protein mixtures. Sets of proteolytic peptide masses representing various numbers of digested proteins in a mixture were generated in silico. In each set, different proteins were selected from a protein sequence collection and for each protein the ...

متن کامل

Assigning significance to peptides identified by tandem mass spectrometry using decoy databases.

Automated methods for assigning peptides to observed tandem mass spectra typically return a list of peptide-spectrum matches, ranked according to an arbitrary score. In this article, we describe methods for converting these arbitrary scores into more useful statistical significance measures. These methods employ a decoy sequence database as a model of the null hypothesis, and use false discover...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proteomics

دوره 2 3  شماره 

صفحات  -

تاریخ انتشار 2002